Selection of Relevant and Non-Redundant Feature Subspaces for Co-training
نویسندگان
چکیده
On high dimensional data sets choosing subspaces randomly, as in RASCO (Random Subspace Method for Co-training, Wang et al. 2008) algorithm, may produce diverse but inaccurate classifiers for Co-training. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. First algorithm relevant random subspaces (Rel-RASCO) produces subspaces by means of drawing features proportional to their relevances measured by the mutual information between features and class labels. We also modify a successful feature selection algorithm, Minimum Redundancy Maximum Relevance (MRMR), to be used for feature subset selection and introduced Prob-MRMR feature subset selection scheme. Experiments on 5 datasets show that proposed algorithms outperform both RASCO and Co-training in terms of the accuracy achieved at the end of Co-training. Theoretical analysis of the proposed algorithms is also provided.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملAudio Genre Classification with Semi-Supervised Feature Ensemble Learning
Widespread availability and use of music have made automated audio genre classification an important field of research. Thanks to feature extraction systems, not only music data, but also features for them have become readily available. However, handlabeling of a large amount of music data is time consuming. In this study, we introduce a semi-supervised random feature ensemble method for audio ...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملA New Iterative Neural Based Method to Spot Price Forecasting
Electricity price predictions have become a major discussion on competitive market under deregulated power system. But, the exclusive characteristics of electricity price such as non-linearity, non-stationary and time-varying volatility structure present several challenges for this task. In this paper, a new forecast strategy based on the iterative neural network is proposed for Day-ahead price...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009